## [1] "Loading the following libraries using lb_myRequiredPackages: data.table"
## [2] "Loading the following libraries using lb_myRequiredPackages: lubridate" 
## [3] "Loading the following libraries using lb_myRequiredPackages: ggplot2"   
## [4] "Loading the following libraries using lb_myRequiredPackages: readr"     
## [5] "Loading the following libraries using lb_myRequiredPackages: plotly"    
## [6] "Loading the following libraries using lb_myRequiredPackages: knitr"

1 Purpose

To extract and visualise tweets and re-tweets of #dockercon for 17 - 21 April, 2017 (DockerCon17).

Borrowing extensively from http://thinktostart.com/twitter-authentification-with-r/

2 Load Data

Data should have been already downloaded using collectData.R. This produces a data table with the following variables (after some processing):

##  [1] "text"             "favorited"        "favoriteCount"   
##  [4] "replyToSN"        "created"          "truncated"       
##  [7] "replyToSID"       "id"               "replyToUID"      
## [10] "statusSource"     "screenName"       "retweetCount"    
## [13] "isRetweet"        "retweeted"        "longitude"       
## [16] "latitude"         "location"         "language"        
## [19] "profileImageURL"  "createdLocal"     "obsDateTimeMins" 
## [22] "obsDateTimeHours" "obsMin"           "obsHour"         
## [25] "obsQH"            "obsQHour"         "obsDateTime5m"   
## [28] "obsDateTime10m"   "obsDateTime15m"

The table has 6,912 tweets (and 8,714 re-tweets) from 5,294 tweeters between 2017-04-16 19:01:03 and 2017-04-19 15:38:54 (Central District Time).

3 Analysis

3.1 Tweets and Tweeters over time

All (re)tweets containing #dockercon 2017-04-17 to 2017-04-19 FALSE = tweets, TRUE = re-tweets

3.2 Location (lat/long)

We want to make a nice map but sadly we see that most tweets have no lat/long set.

All logged lat/long values
latitude longitude nTweets
NA NA 15581
30.26250 -97.74010 25
30.26037 -97.73848 2
30.25820 -97.71264 1
30.25888 -97.73841 2
30.25971 -97.73940 1
30.26006 -97.73813 1
30.26006 -97.73859 1
30.26720 -97.76390 2
30.26036 -97.73848 1
30.26356 -97.73993 1
30.26416 -97.73961 2
30.26623 -97.74328 1
30.26857 -97.73617 1
30.26471 -97.74174 1
30.20227 -97.66723 1
42.36488 -71.02168 1
37.61698 -122.38428 1

3.3 Location (textual)

This appears to be pulled from the user’s profile although it may also be a ‘guestimate’ of current location.

Top locations for tweets:

Top 15 locations for tweeting
location nTweets
NA 2310
San Francisco, CA 1162
San Francisco 451
Austin, TX 283
Seattle, WA 199
Silicon Valley, CA 188
Paris 144
Islamabad, Pakistan 137
London 119
New York, NY 118
Charlotte, NC 113
USA 103
west tokyo 103
San Jose, CA 98
Boulder, CO 97

Top locations for tweeters:

Top 15 locations for tweeters
location nTweeters
NA 966
San Francisco, CA 162
Austin, TX 76
San Francisco 61
Seattle, WA 44
San Jose, CA 40
New York, NY 40
Paris 35
London, England 29
New York 27
Paris, France 27
London 25
Palo Alto, CA 25
Dallas, TX 23
Washington, DC 23

3.4 Screen name

Next we’ll try by screen name.

Top tweeters:

Top 15 tweeters
screenName nTweets
DockerCon 300
theCUBE 154
BettyJunod 119
climbingkujira 117
jpetazzo 108
solomonstre 104
jeanepaul 103
ManoMarks 82
OpenShiftNinja 82
sitspak 81
SFoskett 78
vmblog 77
jameskobielus 77
kaslinfields 74
bsmith626 73

And here’s a really bad visualisation of all of them!

N tweets per 5 minutes by screen name

So let’s re-do that for the top 50 tweeters.

N tweets per 5 minutes by screen name (top 50)

4 About

Analysis completed in: 51.01 seconds using knitr in RStudio with R version 3.3.3 (2017-03-06) running on x86_64-apple-darwin13.4.0.

A special mention must go to twitteR (Gentry, n.d.) for the twitter API interaction functions and lubridate (Grolemund and Wickham 2011) which allows timezone manipulation without tears.

Other R packages used:

  • base R - for the basics (R Core Team 2016)
  • data.table - for fast (big) data handling (Dowle et al. 2015)
  • readr - for nice data loaading (Wickham, Hester, and Francois 2016)
  • ggplot2 - for slick graphs (Wickham 2009)
  • plotly - fancy, zoomable slick graphs (Sievert et al. 2016)
  • knitr - to create this document (Xie 2016)

References

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Gentry, Jeff. n.d. TwitteR: R Based Twitter Client. http://lists.hexdump.org/listinfo.cgi/twitter-users-hexdump.org.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.

Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.